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Abstract — This paper presents a framework aimed at 
monitoring the behavior of aircraft in a given airspace. 
Nominal trajectories are determined and learned using 
data driven methods. Standard procedures are used by air 
traffic controllers (ATC) to guide aircraft, ensure the safety 
of the airspace, and to maximize the runway occupancy. 
Even though standard procedures are used by ATC, the 
control of the aircraft remains with the pilots, leading 
to a large variability in the flight patterns observed. Two 
methods to identify typical operations and their variability 
from recorded radar tracks are presented. This knowledge 
base is then used to monitor the conformance of current 
operations against operations previously identified as stan- 
dard. A tool called AirTrajectoryMiner is presented, aiming 
at monitoring the instantaneous health of the airspace, in 
real time. The airspace is "healthy" when all aircraft are 
flying according to the nominal procedures. A measure 
of complexity is introduced, measuring the conformance 
of current flight to nominal flight patterns. When an 
aircraft does not conform, the complexity increases as more 
attention from ATC is required to ensure a safe separation 
between aircraft. 



Introduction 

To address the challenges of increase in air traffic 
volume, new technologies and procedures are being 
developped in the context of NextGen |T| in the US and 
SESAR |2| in Europe. Automation is a key element, 
necessary to achieve the goals set by those programs. 
New procedures involving more accurate navigation are 
predicted to increase the capacity of the airspace. An- 
alyzing trajectory records is a key element to assess 
the performances and the accuracy of new concepts of 
operations. Automated tools are needed to process the 
large amount of daily flights and corresponding records. 
This work presents two methods to cluster trajectories 
and identify flights that followed identical air routes. The 
first method is based on the identification of waypoints 
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in the trajectories, and the second method is based on 
a principal components analysis of resampled trajecto- 
ries. Operations in the terminal area are managed by 
Air Traffic Controllers (ATC) and are not part of the 
flight plans. It was therefore decided not to use any 
flight plan knowledge or aircraft intent other than the 
destination airport. Then using the knowledge gathered 
from the clustering methods, we propose a real time 
airspace monitoring tool that evaluates the conformance 
of current flight to pre-identified nominal trajectories. 
A measure of airspace complexity based on this con- 
formance is also proposed with the tool. The overall 
method developed in this paper is neither location nor 
data specific and can easily be adapted to other data 
sets since unsupervised methods are used, and the data 
is not labeled. This paper considers radar tracks in a 
terminal radar approach control (TRACON). However, 
the underlying principles may also be used for other 
applications, such as a GPS-equiped fleet of trucks. 
Since this paper deals with different problems such as 
trajectory clustering, airspace monitoring and airspace 
complexity, the literature review is spread along the 
paper at the begining of the corresponding section. The 
remainder of this paper is organized as follows: The 
first section presents the data set used for the study. 
The second section presents the trajectory clustering 
methods, and finally, before the concluding remarks, the 
third section introduces AirTrajectoryMiner, the airspace 
monitoring tool that detects in real time the aircraft that 
do not comply to nominal procedures. 

I. Available Data 

The available data [^consists of records of flight tracks 
over the San Francisco bay area, for the first 3 months 
of 2006. The records cover the Northern California 
TRACON (NCT), that is, a cyHnder of radius 80km and 
height 6,000m centered at Oakland international airport. 
The NCT contains 3 main airports — Oakland, San 
Francisco (SFO) and San Jose International airports — as 



^The complete dataset is available for download http ://dashlink. arc . 
nasa.gov| 
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AIRPORT DIAGRAM 



Fig. 1. San Francisco airport diagram with take off and landing 
direction in the west configuration 

well as many smaller airports. The NCT is the fourth 
busiest terminal area in the US | 3 | with an average of 
133,000 flight instrument operations per month in 2006. 
The data, made of the position and speed of aircraft, is 
organized by flight and also contains metadata for each 
flight that include: type of operation (departure/arrival), 
origin and destination airports, aircraft type (business, 
jet, helicopter, other, etc), date and time of beginning of 
record, duration of the record, etc. 

Using the available metadata, visual flight rules (VFR) 
traffic is discarded, since it is more unpredictable and 
does not follow the same rules as instrument flight rules 
(IFR) traffic. The metadata is used to sort trajectories 
by airport and operation type, i.e. take off or landing. 
After a visual analysis of the flight patterns for the 
different airports, it was decided to focus the study on 
the landings at SFO. It is the busiest airport in the 
NCT and the arrival tracks present the most interesting 
patterns by their numbers and variety. The most frequent 
configuration is the "West" configuration, where aircraft 
land on runways 28L/R and take off from runways 
19L/R. A diagram of SFO is presented in Figure [T] and 
Figure [2] depicts the NCT traffic patterns typically used 
in the west configuration. 

In this paper, the axes are set by the radar, located 
at (0,0,0). The x and y axis define the horizontal plan 
and z the vertical direction, positive going upward. To 
each recorded flight, corresponds an aircraft i and a 
trajectory T^, z = 1 . . . n, where n is the total number 
of trajectories of interest in the dataset. Each trajectory 

is a X 4 matrix, and the line TI of Ti is the radar 
echo, given by T-^ = (x-, z-, t-), where {x\,y\,z\) is 




Fig. 2. NCT standard traffic patterns, west configuration, image 
courtesy of Federal Aviation Administration 

the 3 dimensional coordinates of aircraft i at time t\. 
The trajectories have different numbers of points m^, 
varying from 10 to about 550 points, depending on 
the duration of the trajectory. Trajectories with a few 
datapoints correspond to trajectories on the boundaries of 
the dataset, or to short flights from San Jose International 
Airport or Oakland Internation Airport to SFO. The 
interval between points is between 4 and 5 seconds and 
is given by the rotational speed of the radar (most likely 
4.8 sec). The time stamp t\ is rounded to the nearest 
second. 

II. Trajectory Clustering 

In this section, two trajectory clustering methods are 
presented. After a review of existing trajectory clustering 
methods, a technique based on trajectories' "waypoints" 
is presented. Then, a technique based on a principal com- 
ponent analysis of resampled and augmented trajectories 
is introduced. 

A. Literature review 

The use of positioning devices such as GPS and the 
collection of data has increased over the past 15 years 
leading to an increasing number of tracking applications. 
An objective of tracking is to discover common patterns 
on the one hand, and detect outliers on the other hand. 

Piciareli et al. |4| presented an on-line trajectory clus- 
tering method for real time video surveillance. Moving 
objects such as pedestrians are identified in video frames 
and their trajectories are compared against existing clus- 
ter representatives, that is, an average of all the trajecto- 
ries in the cluster. The match between a trajectory and a 
cluster is determined using the mean of the normalized 
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distances of every trajectory point to the nearest point of 
the cluster representative. If a match is found, the cluster 
representative is updated. If not, a new cluster is created. 
In this approach, the cluster representatives evolve with 
time. This clustering method was used by Dahlbom and 
Niklasson for coastal surveillance but failed to provide 
satisfactory results when dealing with real data sets |5J 
such as ship trajectories. 

Lee et al. O presented a partition-and-group frame- 
work for trajectory clustering. Trajectories are parti- 
tionned in subtrajectories. Subtrajectories are represented 
by line segments and grouped using a distance function. 
The distance function incorporates three components that 
measure the perpendicular distance, the parallel distance 
and the angular distance between the line segments. The 
clustering algorithm is density based, i.e clusters are 
created where the density of points is the highest. The 
formulation is powerful but the results are presented on 
very noisy data where it is difficult to visually cluster 
the trajectories. There exists no well-defined measure to 
assess the results of the clustering method. Based on the 
same distance measure, Lee et al. |7 1 present a trajectory 
outlier detection procedure. The results are presented on 
the same noisy datasets and therefore difficult to evaluate 
visually. 

Vlachos et al. used similarity functions based on the 
longest common subsequence (LCS) to discover simi- 
lar multidimensional trajectories |8|. Their LCS based 
clustering method appears to be more efficient than 
Euclidean distance based measures and dynamic time 
warping distance functions, especially in the presence of 
noise. Dynamic time warping allows the tool to measure 
distance between sequences which may vary in time or 
speed. 

Eckstein proposed an automated flight track taxon- 
omy f9| . The trajectories are first resampled, then 
clustered using /c-means on a reduced order model. The 
model reduction is the truncation of a proper orthogonal 
decomposition (POD), also called principal components 
analysis. The trajectories are clustered using only the 
first two modes of the decomposition, as they capture 
95% of the fluctuations of the dataset used. 



A waypoint is characterized by its GPS coordinates 
and, sometimes, an altitude indication. The planar 
localization of a waypoint is very accurate but its 
vertical component often looks like "at or above — ft". 
Vertical clearances are delivered by ATC and trajectories 
vertical profiles are then at the discretion of the pilots. 
Therefore, this method focuses on the 2D coordinates 
of the waypoints in the (x^y) plan. This method is 
an efficient way to determine the compliance of flown 
trajectories with published procedures. Nevertheless, 
published procedures cannot be used because of the 
limited number of waypoints or reporting points located 



in the TRACON. In Section III we further show this by 
comparing the results of the trajectory clustering with 
the published waypoints. 

The objective is to identify and group the turning 
points into "waypoints". A turning point is a point in 
the trajectory where the aircraft changes heading. Then 
trajectories are represented by a sequence of waypoints. 
Finally, trajectories are clustered using the Longuest 
Common Subsequence (LCS). The algorithm proceeds 
using the following steps and it is summarized in Figure 

m 

1) Identify the location of the turning points of each 
trajectory. 

2) Cluster of the set of all the turning points of all the 
trajectories. This clustering task is done using k- 
means QOl, Gil orDBSCAN |12| (Density-Based 
Spatial Clustering of Applications with Noise). 



Section |II-B2| gives an overview of those algo- 
rithms. This clustering provides a finite number 
of waypoints where it has been determined that 
aircraft usually turn. 

3) Represent each trajectory by its sequence of turn- 
ing points. 

4) Cluster the sequences of waypoints using the Se- 
quenceMiner algorithm 1 13], |14|. SequenceMiner 
provides us with a representative trajectory for 
each cluster. 



B. Waypoint based trajectory clustering 

This section presents a novel algorithm for aircraft 
trajectory clustering. This algorithm takes advantage of 
aircraft trajectory properties: aircraft usually fly straight, 
with a limited number of turns. This method arises 
from the current instrument flight rules procedures. 
When approaching an airport, aircraft usually follow 
published procedures made of a sequence of waypoints. 



1) Turning Points Identification: The first step is 
to extract the location of the turning points of each 
trajectory. To simplify the notations, the aircraft index 
i is omitted in the following equations. The head- 
ing of an aircraft at time can be estimated by 

= arctan |iqn— ^j^, at each point of the trajectory, 
/ = 2 . . . m — 1, where m is the total number of points. 
Since the trajectory is a bit noisy, a low pass filter is 
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Clusters of trajecto ries 



Cluster using SequenceMiner 



Trajectory as a sequence of waypoiiits 

W, = {C4,C8,Ci,...} 



Trajectory as a sequence of waypoints 

l^^ = {C2,C6,Ci,...} 



\Identification of way points followed by a trajectory 



Waypoints: Clusters of turning points 
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Cluster using fc-means or DBSCAN/ 
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Set of all turning points 



5, = {tpj . . . } 



= {tp| . . . } 



) ^ Identification of turning points 



Trajectory of aircraft i: Ti 
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Fig. 3. Waypoint clustering method 

applied: 

= at/j^ ^ {1 - a)i)^-\ 



/ = 2...m- 1, 



(1) 
(2) 



where a is a constant for the filter. On this data, setting 
a = OA provided good noise filtering results and not 
too much delay. A turning point tp is identified when 
the heading difference between two consecutive values 
of the heading exceed a threshold: — ^^~^| > 
The threshold was chosen relatively small in order 
to capture small heading changes but not too small 
not to capture meaningless heading changes variations: 
= 0.025rad = 1.43°. This value was set experimen- 
tally. The results are not very sensitive to a small change 
in ^c- The number of turning points is trimmed to avoid 
long sequences when aircraft are executing large turns: 
if two consecutive turning points are determined, the fist 
one only is kept; if three, only the midlle one, etc. 

The trajectory of aircraft i is now represented as a 
sequence of turning points Si : 

5i = {tpi...tpf}, 

where tp| is the 3D coordinate of a turning point. The 
first point of the trajectory is labeled as a turning point. 



Figure [4] presents a sample of 11 trajectories and the 
points identified as turning points. 



Trajectories and turning points 
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Fig. 4. Trajectories and identified turning points 

Denote by S the set of all the turning points for all 
the trajectories: S = {Si . . . Sn}- 

The second step is to cluster the set S of turning 
points. The following section introduces the two clus- 
tering algorithms used in this paper: /c-means ifTSl and 
DBSCAN |12|. 

2) Algorithms overview: k-means and DBSCAN: 
a) Overview of k-means l[T5\l : This paragraph 
presents a brief overview of the k-means algorithm. 
For more details, the reader is refered to |11|. Given 
a set 5 = (tpi, . . . tp|5|) of \S\ observations (turning 
points in our case), where each observation is a d- 
dimensional real vector, then k-means clustering aims at 
partitioning the l^*! observations into k sets, or clusters, 
{k < \S\)^ C = {Ci, C2, . . . , Ck} so as to minimize the 
within-cluster sum of squares: 



arg mm 
c 



k 

E E 

i=i tp-ec^ 



(3) 



where is the mean of The mean of a cluster 
is called centroid and is the center of mass of all the 
elements in the cluster. The number k of clusters is the 
only input required from the user. 



(1) 



,m 



(1) 



Starting with an initial set of k centers 
which may be specified randomly or by some heuristic, 
the algorithm proceeds by alternating between two steps, 
also known as Lloyd Algorithm |[T6ll : 
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Assignment step: Assign each observation to the 
cluster with the closest mean, that is partition the obser- 
vations according to the Voronoi diagram generated by 
the centroids of the clusters. Figure [5] presents the results 
of /c-means clustering and the corresponding Voronoi 
diagram. 



it) 



for all 



(4) 



1,...,^} 

Update step: Calculate the new means to be the 
centroid of the observations in the cluster. 

1 



m 



(t+i) 



\C 



E 



tp, 



(5) 



(*) 



The algorithm is deemed to have converged when the 
assignments no longer change. Since it is a heuristic 
algorithm, there is no guarantee that it will converge 
to the global optimum, and the result may depend on 
the initial clusters. Since the algorithm is usually very 
fast, it is common to run it multiple times with different 
starting conditions and keep the run that resulted in the 
minimum value for equation |3] 

4 Clustering of turning points using i<-means 




Fig. 5. Clusters of turnings points and corresponding Voronoi diagram 

b) Overview of DBS CAN: This paragraph presents 
a brief overview of the DBSCAN algorithm. For more 
details, the reader is refered to |17J. DBSCAN ifHll 
stands for Density-Based Spatial Clustering of Appli- 
cations with Noise. DBSCAN clusters points that are 
close together (in an e neighborhood), and surrounded 
by sufficiently many points. DBSCAN requires two 



parameters: a real,6, and the minimum number of points, 
MinPts, required to form a cluster. The e-neighborhood 
of a point p consists of all the points q s.t dist{p^ q) < e. 
If the e-neighborhood of a point p contains more than 
MinPts, a new cluster is started, with p as a core ob- 
ject. DBSCAN then iteratively collects directly density- 
reachable objects from these core objects. An object q 
is said to be directly density -reachable from an object p 
if q is in the e-neighborhood of p and p is a core object. 

If a core object g of a cluster Cq is added a cluster 
Cp, Cp and Cq are merged. When no point can be added 
to any cluster, the process terminates. 

3) Turning Points Clustering; creation of way points: 
To determine the waypoints, the turning points are 
clustered: a waypoint is defined as the planar {x^y) 
coordinates of a cluster of turning points. The idea is 
to create a waypoint where it has been determined that 
many aircraft turned. Depending on the number and on 
the density of available turning points , two different 
algorithms are used. When the spatial distribution of 
turning points is sparse, /c-means is used, and when the 
distribution of turning points is dense, DBSCAN is used. 

a) Case when the data is sparse: When the 
number of turning points is small, a density based 
clustering algorithm would provide poor results, 
identifying most of the points as outliers. Therefore, 
a distance-based algorithm is used so all the turning 
points available are used. A waypoint is created for 
each cluster produced by /c-means. Using cylindrical 
coordinates, the coordinates of the center of a 
waypoint are given by (rmthetam)- The center is the 
center of mass of all the points in the cluster. The 
coordinates of the corners of the waypoints are given by 

{{rm + 2stdr, Om + 2stde), (r^ - 2stdr, Om + 2stde), (r^ - 2stdr, Or, 

where stdr and side are the standard deviation of the 

radial coordinates and angular coordinates of the 

points in the cluster, respectively. Figure [6] presents the 

outcome of clustering the waypoints for one day of 

trajectories. Each cluster is represented using a different 

color/shape combination. The waypoints and represented 

by pairs of nested polygons on the figure. The inside 

polygon corresponds to (r^ ± stdr^ Om ± stdg) and the 

outside one to (r^ ± 2stdr-,0m ± 2stde). The number 

is the label of the cluster. 

b) Case when the data is dense: When the the 
number of turning points is large, a large share of the 
airspace is covered with turning points. A distance based 
algorithm such as /c-means provides meaningless clusters 
for our application. Figure |5] shows the clusters provided 
by /c-means and the corresponding Voronoi diagram for 
the turnings point of almost 3 month of data ( 30,000 
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Clustering using Kmeans 
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Fig. 6. Result of the clustering of the turning points for one day 

trajectories). 

To overcome this issue, the turning points were clus- 
tered using DBSCAN. DBSCAN is particularly efficient 
to cluster data in the presence of noise. Waypoints 
are created using the convex hull of the clusters re- 
sulting from DBSCAN. Figure |7] shows the result of 
the clustering of the turning points using DBSCAN. 
The blue polygons represent the waypoints. All the 
points identified as outliers, i.e not associated with any 
waypoint, are not depicted. The parameters used were 
e = 350m and minPts = 10. The main issue with 
DBSCAN is its execution time since its complexity is in 
0{n log n). Here, the number of turning points to cluster 
is n = 118, 179 for 30,000 trajectories. 

4) Converting a trajectory into a sequence of way- 
points: The waypoints have been discovered using the 
turning points of the trajectories. Nevertheless, some 
trajectories might go over waypoints without actually 
turning. To identify the sequence of waypoints followed 
by a trajectory, the following procedure is used for each 
trajectory: start with an empty sequence of waypoints, 
and given the set of all waypoints, run the trajectory 
along its original direction. If one of the points is located 
over a waypoint, the waypoint is added to the sequence. 
Each trajectory is now represented as an ordered se- 
quence of waypoints, where the number of waypoints 
is finite. The next step is to cluster the trajectories 
determining the longest common subsequence (LCS) of 
waypoints. 

5) Longest Common Subsequence determination: The 
sequences of waypoints are clustered using the longest 



Turning points ciustering using DBSCAN 
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Fig. 7. Clusters of turning points using DBSCAN. Outliers are not 
displayed 



common subsequence. The LCS problem is to find the 
longest subsequence common to all sequences in a set of 
sequences. SequenceMiner |[T3l , (W \ is an algorithm that 
identifies the LCS and generates clusters of sequences. 
This method allows to cluster of sequences that do not 
contain the same number of elements. When sequences 
have only a small number of points, say fewer than 3, 
this clustering method does not work well. Therefore, 
only the sequences containing more than 4 waypoints 
are kept. The total number of waypoints being small, it 
is preferable to focus on the waypoints at the begining 
of the trajectory: since most aircraft do a final turn to 
get aligned with the runway, this turning point does not 
bring much information about the trajectory. Therefore, 
if the last turning point is in the large brown cluster 
(Figure [7]), it is removed from the sequence. 

Figure [8] presents the results of the clustering process 
using /c-means and LCS on a low number of trajectories. 
The dataset used is the tracks of all the aircraft landing at 
San Francisco (SFO) airport on February 10, 2006. Only 
the trajectories of that day were used to determine the 
waypoints. Each cluster is represented by a color. The 
algorithm identifies the main flows but a few trajectories 
seem not to belong to the expected cluster. The quality 
of the results is subjective and can only be visually 
assessed. Figure [9] presents the results for an initial set of 
30,000 trajectories, using DBSCAN and LCS. Here, the 
denomination "Nominal" qualifies the trajectories con- 
taining more than 4 waypoints. The colors corresponds 
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Fig. 8. Results of trajectory clustering for the landings of one day at Fig. 10. Trajectory clustering method based on Principal Components 
SFO Analysis 
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Fig. 9. Results of trajectory clustering for 30,000 trajectories 

to the clusters. They differ on figures [8] and [9] because 
the indexing of clusters is random and depends on the 
order of the data in the dataset. 

e clus Overall, this method presents good clustering 
results. One of the main drawbacks of this method is that 
it only keeps the trajectories going over the waypoints. 
For instance, consider two parallel trajectories: one going 
over the waypoints and the other one slightly off. The 
latter will be considered as an outlier eventhough it is 



very similar to the first trajectory, resulting in excluding 
many trajectories. In addition, trajectories containing 
large rerouting periods will belong to the clusters as long 
as they pass over waypoints. 

C. Trajectory-based clustering via component analysis 
This method proceeds with the following steps, which 



are summarized in a diagram in figure 10 

1) Resample the trajectories, to obtain time series of 
equal length for each aircraft. 

2) Augment the dimensionality of the data. 

3) Normalize and concatenate all the data into a 
single vector for each flight. 

4) Run a principal components analysis (PGA) and 
keep the first 5 principal components (PCs). 

5) Cluster using a density-based clustering algorithm. 

This paper proposes improvements to the approach 
used by Eckstein |9| to realize a trajectory taxonomy. 
In 191, trajectories are first resampled, then the principal 
components are extracted and finally, the clustering is 
realized using /c-means on the projections onto the first 
two principal components. Figure |ll(a)| presents the 
resulting clusters on the principal components and on 
the trajectories, using the methods introduced in |9|. The 
clustering technique proposed in |[9l does not provide 
result precise enough for our data set and there is no 



identification of outliers. Figure |ll(b)| presents a 3D 
view of the projection onto the first three PCs (section 
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(a) Clusters of trajectories 
3D plot of the first 3 PCs of the trajectories after clustering 
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(b) Clusters of the first principal components 
Fig. 11. Clustering results using the method presented in [9] 



II-C3| ) so it can be compared with our method. Eckstein 
used only the first two PCs for clustering. The first 
improvement is to augment the dimensionality of the 
data. Then, the PCs are computed and the projections 
of the augmented trajectories onto the first five PCs are 
clustered using a density based clustering algorithm. This 
algorithm presents the advantage of identifying outliers. 
Another advantage is that the number of clusters is not 
set a priori. 



1 ) Data Cleaning and Formatting: The dataset is well 
organized and fairly clean. Trajectories with fewer than 
50 points are removed from the dataset: to be able 
to use a clustering algorithm such as DBSCAN, each 
trajectory must be represented as a vector. All vectors 
must have the same number of elements n, so their 
distance can be computed. Since all trajectories do not 
have the same number of points, resampling is necessary. 
Trajectories are resampled so that the total number of 
points for each trajectory is 50. For the sole purpose of 
clustering, fewer than 50 points would have been enough. 
Nevertheless, to improve the accuracy of the airspace 
monitoring function presented in section |lll| 50 points 
were used. The resampled trajectory T/ is given by 
j.samp = {T^j = {round(^), k = l... 50}}. Dur- 
ing this operation, the notion of speed that was given 
by the distance between the radar echos is lost. 

2) Dimensionality augmentation: To improve the re- 
sults of the clustering, the dimensionality of the data 
was increased. Some of the added dimensions present 
symmetry with respect to a point or a line and some do 
not. 

• Cartesian position of the aircraft in the resampled 



trajectory: P = [xj 



,50 



].Pis 



a row vector with 150 components. This vector is 
unique to each trajectory. 

Distance from the center of the TRACON 
R = {ri = v/RF+W+W, 1 = 1... 50}. 
Provides information about the rate of convergence 
of the aircraft toward the center of the TRACON, 
which is located close to the airport. This distance 
presents a symmetry with respect to the center 
of the TRACON, i.e two trajectories that are 
symmetric with respect to the center of the tracon 
will be represented with the same vector R. 
Distance from the top left corner: D = 

{d\ = ^{x\-Xref)^+\yl-yrefr + {zi)^,l = 

1...50}, where (xref^Vref) are the coordi- 
nates of the top left corner a square containing 
the TRACON. The top left corner has coordi- 
nates (xrefjUref) = (— 80, 80)/cm. This distance 
presents a symmetry with respect of the diagonal 
joining the top left comer (—80, 80) and the bottom 
right comer ((80,-80)), i.e two trajectories that 
are symmetric with respect to this diagonal will be 
represented with the same vector D. 
Angular position in cylindrical coordinates: 
6 = {6>^ = arctan(^),/ = 1...50}. A constant 
value and slow rate of change indicate a straight 
trajectory while high variablity indicates curved 
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trajectories. Does not present any symmetry. 
• Heading of the aircraft ^ = {V^-, / = 1 . . . 50}. The 
computation of the heading was done using the filter 
of equation [T] and then resampled to 50 points. This 
vector is unique to each trajectory. 

The sine and cosine values of the angular position and 
heading are used instead of their actual value to avoid the 
discontinuity at ±7r. Each trajectory is now represented 
by a vector of dimension 450 given by: T^'^^^ = 
[P R D cos(e) sin(6) cos(^) sin(^)]. 
The initial vector had dimension 150. The values of 
each parameter are normalized between and 1 in 
order to balance their importance during the clustering 
process. It was decided to add meaningful data such 
as heading or rate of convergence toward the center 
of the TRACON. For instance, two aircraft on parallel 
trajectories will fly the same heading, even if the 
trajectories are slightly apart from each other. Distance 
to the center was chosen to identify trajectories that 
have particular patterns such as vectoring and holding 
pattern: the distance to the center will present some 
irregularities as the aircraft flies back and forth. Such 
irregularities will be highlighted by dimensions such as 
the heading that will change 180° while the position 
will only change slightly. 

3) Principal Components Analysis: A principal com- 
ponents analysis |18| is run on matrix that contains 
all the resampled trajectories. Each trajectory is then 
projected onto the first p principal components and is 
now represented by a vector of p values. The choice in 
the value of p is a trade-off between computational speed 
when p is small and accuracy when p gets larger. There 
is no need to get a value of p too large since the first 
principal components contain most of the information. 
Different values of p were tried and p = 5 gave a 
satisfactory level of accuracy for this type of data. The 
added dimensions increse the range of the projection 
of the trajectories onto the principal components. This 
makes the clustering task easier as the clusters are 
"further apart" in the principal components space. 

4) Clustering: The projections of the trajectories onto 
the first 5 PCs are clustered using DBSCAN. A density 
based clustering algorithm like DBSCAN is prefered to 
a distance based algorithm because of the shape of the 
clusters that can be arbitrary. The other advantage of 
DBSCAN is the identification of outHers. Figure 1 12(b) 



(Figure [T2(a)). 



presents the resulting clusters. The axis corresponds to 
the value of the first 3 principal components. Clusters 
are clearly differentiated, even if they are not easy to 
distinguish on the plot due to the perspective effect. The 
resulting clusters of trajectories are visually very clean 
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(a) Clusters of trajectories 
3D plot of the first 3 PCs of the trajectories after clustering 
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(b) Clusters of the first 5 principal components 

Fig. 12. Clustering results using resampling, data augmentation, PC A 
decompostion, and DBSCAN on the first 5 principal components. 



Figure 13 presents the centroids, that is the center of 
mass of the trajectories of each cluster. Those centroids 
can be seen as "standard procedures". Some clusters are 
minor variations from each other, such as the flights 
coming from the bottom left corner. This comes from 



the the settings used for DBSCAN. On Figure 12(b) 



one can clearly identy clusters of points. The algorithm 
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was run with a high sensitivity (e small and minPts 
large). The parameter e reflects the similarity between 
trajectories (the smaller the more similar), and minPts 
is the number of "similar" trajectories needed to create 
a new cluster (Section |II-B3b|). A small e generates 



"narrow" cluster while a larger e will generate clusters 
with more variability in trajectories. In the application 
presented in section [Illj the algorithm is run with a 
lower sensitivity and provides fewer clusters, with larger 
variability. The resulting centroids of this run can be seen 
on Figure [2Q| 

Centroids for all Clusters 




Trajectories identified as outliers 
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Fig. 14. Trajectories identified as outliers 



Jets Business Regional 



Jets Business Regional 



0.4 0.6 
X-axis, meters 



Fig. 15. Repartition of outliers by aircraft category 



Fig. 13. Clusters centroids (average trajectory) 



5) Analysis of outliers: Figure 14 shows the outliers 
detected by the clustering algorithm. Outliers represent 
19.5% of all trajectories. A visual inspection shows that 
the main reasons for being detected as an outlier is the 
presence of holding patterns, large vectoring maneuvers 
or direct routes. 



Figure [T5]presents the number and frequency of outlier 
trajectories as a function of the type of aircraft. Jets 
represent the largest share in numbers, but the frequency 
is much smaller. Among the trajectories of regional and 
business aircraft, 4% and 5% are identified as outliers, 
respectively. A possible explanation is the size, the speed 
and the maneuverability of the aircraft. To ensure a safe 
separation at the runway threshold, air traffic controllers 
"vector" aircraft, that is give a sequence of headings 
to follow. The vectors given to business and regional 
aircraft might be different and sharper than the vectors 
given to larger size jets. 



day of study. Each bar represents one day. The minimum 
percentage of outliers is less than 2% and goes up to 
16%. The most likely explanation is the weather. San 
Francisco airport usually operates with two close parallel 
runways. The runways are not independent, that is, they 
cannot be operated simultaneously when the weather 
does not permit visual approaches. When a runway 
is closed, the landing capacity is reduced from 60 to 
30 aircraft per hour. Schedules and operations usually 
take the weather into account , but unexpected late fog 
dissipation or other type of convective weather might 
disrupt the operations and force controllers to vector 
aircraft and put them on holding patterns. 



Figure 16 presents the frequency of outliers for each 



Figure [17] presents the frequency of outliers as a 
function of the time of the day. The local time is reported 
on the abscissa axis, starting at midnight. This diagram 
is an average over the entire period of interest. The 
frequency of outliers is higher during the night, then 
decreases in the early morning, to in increase again with 
a peak at 11a.m.. Another peak is visible at 5 p.m.. 
The outliers identified at night are mostly due to direct 
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Fig. 16. Histogram of outliers, day by day 
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Fig. 17. Histogram of outliers, hour by hour, local time 



routing that is allowed by the very low traffic density 
at night. During the morning, traffic density increases 
and requires more rerouting for efficient sequencing 
and merging. Another possible explanation is the late 
dissipation of the fog. 

III. Airspace monitoring 

This section proposes an airspace monitoring tech- 
nique that automatically detects when an aircraft is not 
conforming to standard procedures. Standard procedures 
are determined using the centroids of the clusters found 
using the previous clustering methods. Centroids corre- 
spond to flight path often flown and the value of the 
parameters used for clustering allows the trajectories to 
vary more around the centroids. The objective of the 
monitoring task is to detect in when an aircraft deviates 
from nominal path in real time. 

A. Literature review and motivation 

Krozel |19| proposed an intent based monitoring 
where the aircraft is tracked relative to a filed flight 
plan, using NavAids and waypoints. The monitoring 
tasks requires knowledge of the airspace structure, of 
the trajectory waypoints and of the intent of the aircraft. 
It is a powerful tool when the flights behave according 
to their flight plan, but when dealing with arrivals, the 
sequence of waypoints might change, some might be 
skipped or added to ensure an optimal separation of 
aircraft at the runway threshold and vectoring is often 
used. This monitoring method cannot be used. Reynolds 



et al. f20l introduced a framework for the development 
of an automated conformance monitoring system. The 
system described in |20| has two main inputs: the 
conformance basis, containing target states and trajectory 
information, and the observation of a surveillance sys- 
tem. Those inputs feed models for pilots intents, aircraft 
intents, and aircraft control systems and dynamics. Those 
models provide an expected state vector that is compared 
with the observed state vector for conformance analysis. 
This structure is further used in [21 1 to monitor the 
conformance of a trajectory to a flight plan. For instance, 
it detects if an aircraft does not turn, turns too early 
or too late at a waypoint. The monitoring is based on 
intent and knowledge of the the exact expected trajectory 
is required. An offline trajectory analysis and taxonomy 
for arrival trajectories was proposed by Eckstein |9|. The 
objective of Eckstein is to analyze the performance of 
area navigation (RNAV) operations for NextGen con- 
cepts of operations offline. The method in f9l uses GPS 
coordinates of actual waypoints to identify and classify 
segments of trajectories. 



Figure [18] displays an aerial view of the San Francisco 
Bay Area. The blue circle represents the outer boundary 
of the TRACON, given by the area covered by the 
radar. The white lines are the centroids identified in 
section [ir-C4[ The yellow dots are waypoints or reporting 



points. The locations of those points come from Stan- 
dard Terminal Arrival Routes (STAR) and track logs 
|22|. The centroids of the clusters pass over only a 
limited number of waypoints. This shows that using 
the published waypoints and reporting points cannot be 
efficiently used to monitor traffic in the TRACON. The 
intent based methods cannot be used in the terminal 
area. Figure [18] also displays the arrival from the north 
for turboprops. This arrival procedure has not been 
identified by the clustering algorithm because of the 
relatively small number of aircraft using this route, 
and of the variability of the flight path following this 
procedure. A real time trajectory analysis tool built upon 
the knowledge gathered from the clustering analysis is 
now proposed. The tool is called AirTrajectory Miner 
(ATM) since it enables the monitoring of operations in 
the TRACON. Current aircraft trajectories are compared 
against nominal trajectories, that is the trajectories in the 
clusters. If they differ too much, the current trajectory is 
tagged as abnormal, or outlier. The only intent used is the 
aircraft final destination airport. The tool automatically 
detects if the aircraft is flying one of the possible 
approaches, including most commonly used vectoring 
maneuvers. 
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Fig. 18. Centroids of the clusters and reporting points/way points for 
SFO arrivals 



B. Data formatting 

It is not possible to directly compare the current trajec- 
tories with nominal trajectories, since current trajectories 
are incomplete. During real-time system operations, only 
past data are known. Therefore, the nominal dataset is 
fragmented. Resampled trajectories that had 50 points 
are split in 10 fragments of 5 points. The average travel 
time in the TRACON for aircraft landing at SFO is about 
14 minutes. Therefore, 5 data points correspond to about 
14*60/10 = 84 seconds. A memory of 80 seconds is 
used for current tracks. The radars hits of the last 80 
seconds of flight are resampled to 5 points that can now 
be compared against the database of nominal tracks. This 
process is done using the Inductive Monitoring System. 



Aircraft outside 
tlie TRACON 



Aircraft 
in tine TRACON 



Surveillance System 
(Secondary 
RADAR, ADS-B,...) 



Dataset of trajectories 
identified as nominal 



Aircraft to moi 




Fig. 19. Schematic view of the air traffic control system in and around 
the TRACON - Integration of AirTrajectory Miner 



The training dataset corresponds all the trajectories 
identified as nominal and fragmented in 10 segments of 
5 points. The total number of segments was 276,040. 



C Anomaly detection: Inductive Monitoring System 

To detect anomalous trajectories, the Inductive Moni- 
toring Health System (IMS) |[23l is used. IMS is a good 
alternative to model based health monitoring systems. It 
provides a high fidelity detection tool, and there is no 
need to manually build a model. IMS runs in two steps: 
learning phase and anomaly detection. IMS learns the 
nominal behaviors using a training dataset provided by 
the user. IMS builds clusters using /c-means clustering 
and density-based clustering. During the anomaly detec- 
tion phase, the input data is compared with knowledge 
base built from the training data. The anomaly score 
can be interpreted as the distance to the nearest cluster. 
The input data belongs to a cluster if all the parameteres 
values are within the range specified by the cluster limits. 



D. AirTrajectoryMiner: Monitoring tool 

AirTrajectoryMiner (ATM) is a real time TRACON 



monitoring tool. Figure 19 shows how ATM could be 
incorporated into the air traffic management environ- 
ment. The inputs to the tool are the set of all the 
trajectories identified as nominal, work resulting from 
and the radar tracks of the flights of 



II-C4 



section 

interest. ATM delivers two types of ouputs. On the one 
hand, it delivers an indication of conformance of current 
flight to nominal procedures, and on the other hand, it 
delivers a measure of the complexity in the TRACON 
that can be incorporated in Traffic Management Advisor 
(TMA) softwareEl. 



Figure 20 displays the monitoring environment. The 
top frame is a 2D view of the airspace. The records cover 
a cylinder of radius 80km, going from the ground up to 
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Fig. 20. AirTrajectory Miner display. Top frame: conformance to 
standard procedure. Bottom frame: time history of complexity in the 
TRACON. 



6000 m. The following provides some information about 
the display and associated aircraft count. 

• Green circle: aircraft intended to land at SFO 
(associated count: usfo)- 

• Grey square: aircraft not intended to land at SFO 
(associated count: n-^^Q). 

• Green segment: trajectory of an aircraft intended 
to land at SFO and following the procedures. 

• Red segment: trajectory of an aircraft intended to 
land at SFO and whose trajectory is identified as 
an outlier: it does not to follow the procedures 
(associated count: n-Qj^ sfo^- 

• Grey segment: trajectory of an aircraft not intended 
to land at SFO and whose trajectory does not 
interfere with traffic landing at SFO. 

• Orange segment: trajectory of an aircraft not in- 
tended to land at SFO and whose trajectory may 
interfere with traffic intended to land at SFO (asso- 
ciated count: Ur 

• Colored 



'OK.SFO^- 

lines: Centroids 



of the clusters of tra- 
jectories identified as nominal. The centroids pre- 



sented differ from the ones on Figure 13 because 
the clustering algorithm was re-run with different 
parameters allowing more variability and therefore 
creating fewer clusters. 

Aircraft intent information comes from the data. The 



length of the line following the aircraft corresponds to 
the part of the trajectory being analyzed, that is the 
last 80 seconds of the trajectory. The length of the line 
is therefore proportional to the velocity of the aircraft. 
This display is intended for an air traffic controller 
managing the arrivals at SFO. A similar display would 
be used for managing other arrivals or departures. The 
only change would be the training data for IMS and the 
centroids displayed. Aicraft with a grey segment can be 
ignored, since they are not landing at SFO and are not 
interfering with landing traffic at SFO. Aircraft with a 
green segment are following the standard procedures to 
land at SFO. Aircraft in orange require special attention 
since they are not intended to land at SFO but present 
characteristics that identify them as "in the pattern to 
land at SFO". They conform with some of the SFO 
landing trajectories. Aircraft in red also need special 
attention since they are supposed to land at SFO but 
currently not on standard tracks. The controller needs to 
make sure they are not generating conflicts or interfering 
with other traffic. Based on the compliance of current 
flights to procedures, we define a measure of complexity 
for the TRACON, which could provide an automatic 
feedback of the health of the TRACON to the traffic 
flow manager who regulates the flow of aircraft arriving 
in the TRACON. 

E. Measure of complexity 

Complexity in air traffic management is a widely 
studied topic ||25]| . A measure of airspace complexity 
is called dynamic density |26| and was intended to 
understand the effect of changing airspace configura- 
tion and traffic controller workload. It is a function of 
the traffic density and the number of aircraft changing 
heading, speed or altitude, and, the separation between 
aircraft. This measure is a weighted sum of several 
parameters and the weights have been determined us- 
ing human in the loop experiments. The model was 
fitted to the observations of controllers. Delahaye and 
Puechmorel |27 | propose a measure of complexity based 
on the Lyapunov exponents of a time varying vector 
field that interpolates aircraft position and velocities. 
This intrinsic complexity measure reflects the stability 
of the traffic configuration. Lee L28 1 propose complexity 
measure based on the response of the airspace to a 
disturbance. Disturbance can be an intruder aircraft in the 
airsapce and the corresponding measure of complexity is 
the deviation required by the other aircraft to solve all the 
conflicts. Gariel et Feron |29| proposed complexity maps 
based on the degradation of communication, navigation 
and surveillance capacities. The degradation results in a 
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required increase in separation distances creating new 
potential conflicts. The complexity measures the dif- 
ficulty to steer the traffic from the nominal mode of 
operation with initial separation distances to degraded 
mode of operation with increased separation distances. 

This paper introduces a new complexity metric, based 
on the compliance of aircraft to procedures identified 
as nominal. According to 1301 . (Jl l, controllers build 
a mental model of nominal operations. The complexity 
of a traffic configuration perceived by the controllers 
increases when aircraft flight path do not follow this 
mental model. When operations are running as expected, 
the controller is more efficient and can deal with more 
aircraft. Thus, increasing the number of aircraft not 
following nominal procedure will reduce the maximum 
number of aircraft a controller can deal with simul- 
taneously, reducing the capacity of the airspace. The 
proposed complexity measure is based on Shannon's 
theory of communication \3T\. Using the aircraft counts 
introduced earlier, the instantaneous probability of an 
aircraft inbound for SFO to be identified as nominal is 



p{OK\SFO) 



nOK.SFO 



(6) 



nsFO 

For the aircraft inbound for SFO and identified as out- 
liers, it is assumed that each outlier aircraft is unique and 
independent from other aircraft. At each instant, each 
outlier is considered a different from the other outliers, 
that is there are types of outliers. Therefore, 

the probability of an aircraft to be a specific outlier is 

1 
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The entropy Isfo of the aircraft inbound to SFO is 
therefore 

Isfo = - p{OK\SFO) \ogp(OK\SFO) . . . 

'^OK^,SFO 

- p(OK^SFO)\ogp(OK^SFO) 
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The same reasoning is used for aircraft not inbound 
to SFO: 



"-SFO 



''OK, SFO 



log 



''OK, SFO 



log- 



(9) 



The proposed measure of complexity C is the sum of 
the entropy of aircraft inbound to SFO and the entropy 
of aircraft not inbound to SFO. 



C = I 



SFO 



'SFO- 



(10) 



This complexity measure is an indication of the dis- 
order with regard to nominal operations. If no aircraft is 
identified as an outlier, the complexity is 0. The com- 
plexity increases with the number of outliers detected, 
but also with the number of aircraft. The bottom left 



plot of Figure 20 shows this measure of the complexity 
over the last 10 minutes. The plot is refreshed every 15 
seconds. When the traffic flow manager sees that the 
complexity increases, ATM provides information about 
the operations in the TRACON. If the complexity gets 
high, the controller in charge of the TRACON is likely to 
have a high workload. Providing the traffic flow manager 
with this complexity measure can help him to manage the 
flow of arriving aircraft. A low complexity suggests that 
more aircraft can be allowed in the TRACON. Increasing 
complexity suggests that the TRACON controllers are 
subject to an important workload and that the aircraft 
arrival rate should be reduced. 

An aircraft can be detected as "off nominal track" 
because the controller may require a large vectoring ma- 
neuver, meaning that the runways are congested. Another 
explanation for "off nominal" tracks can be rerouting due 
to weather. This tool can also be used as an automatic 
independent monitoring tool. Intent based tools |2T1 
cannot be used in terminal areas since controllers give 
vectors that do not appear in the flight plan. Moreover, 
there are many turns and altitude changes that are left 
to the pilot to execute. 

Conclusion 

This paper presented two trajectory clustering methods 
and an application to airspace monitoring. The monitor- 
ing tool compare the conformance of current flights to 
identified nominal procedures in real-time. The version 
of the tool presented in this paper monitors the landings 
at SFO, but it can easily be modified to monitor any 
traffic pattern, by modifying the input dataset. 
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